Domain-specific Stop Words in Malaysian Parliamentary Debates 1959 – 2018

نویسندگان

چکیده

Removal of stop words is essential in Natural Language Processing and text-related analysis. Existing works on Malay are based standard Quranic/Arabic translations into Malay. Thus, there a lack domain-specific word list, making it discordant for processing parliamentary discourse. In this paper, we propose semantic approach towards identifying removing Malay, conventional spelling English functional analysing time-series corpus, namely the Malaysian Hansard Corpus (MHC), to extract specific-domain list. The study utilised combination Z-method most frequently occurring words, that appear once, classic method. dataset corpus evaluated comprised Parliament 1 (year 1959) 13 2018). then categorised list according related words. resulting 587 New emerged from MHC include parliamentary-related like ‘ Berhormat’ (salutation members Parliament), Pertua’ Speaker House), ketawa’ (laugh) tepuk’ (clap). Other than typical ‘and’ ‘the’, also ‘hon’ble’ (short ‘Honourable’) ‘honourable’. includes untok ’ (for), lebeh (more), kapada (to). proposed set can be further assist natural language text

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exemelification of Parliamentary Debates

Parliamentary debates are an interesting domain to apply state-of-the-art information retrieval technology. Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of countries; they are often free of copy-right; citizens often have a legal right to inspect them; and several countries make gre...

متن کامل

Modelling argumentation in parliamentary debates

In this paper we apply the information state update (ISU) machinery to tracking and understanding the argumentative behaviour of participants in a parliamentary debate in order to predict its outcome. We propose to use the ISU approach to model the arguments of the debaters and the support/attack links between them as part of the formal representations of a participant’s information state. We f...

متن کامل

Advanced Information Access to Parliamentary Debates

Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of many countries; they are often free of copy-right; citizens often have a legal right to inspect them; and several countries make great effort to digitize their entire historical collection and make it available to the general public. T...

متن کامل

Bringing parliamentary debates to the Semantic Web

An analysis of parliamentary debates and media resources that cover them can provide insight into the political climate of a country. Although debates are now regularly published on official government portals, their analysis remains a cumbersome and challenging task for historians and political scientists. One of the main tasks of the PoliMedia project is to allow easy crossmedia comparisons a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: GEMA Online Journal of Language Studies

سال: 2021

ISSN: ['2550-2131', '1675-8021']

DOI: https://doi.org/10.17576/gema-2021-2102-01